********************************************************************************
** 	TITLE:		repprim2016_polls		                                      ** 	
**	AUTHOR:	    Philippe Mongrain                                             **
**  DATE:	    October 2022 						                          **	
**  VERSION:	Stata 16					                                  **	
********************************************************************************

* Version control

version 16.0

* Import data

import excel surveys, firstrow sheet(repprim2016) clear
destring _all, replace

* Generate survey date

gen date = date(polldate, "MDY")
format %tdMon_DD,_CCYY date
drop polldate
rename date polldate

* Generate election date

gen edate = 20160721

gen electiondate = date(string(edate,"%8.0f"),"YMD")

format %tdMon_DD,_CCYY electiondate

* Time of survey

gen time = (electiondate - polldate)

* Generate mean vote intention value by day

bysort time : egen bushvote = mean(bush)
bysort time : egen carsonvote = mean(carson)
bysort time : egen christievote = mean(christie)
bysort time : egen cruzvote = mean(cruz)
bysort time : egen fiorinavote = mean(fiorina)
bysort time : egen gilmorevote = mean(gilmore)
bysort time : egen kasichvote = mean(kasich)
bysort time : egen huckabeevote = mean(huckabee)
bysort time : egen paulvote = mean(paul)
bysort time : egen rubiovote = mean(rubio)
bysort time : egen trumpvote = mean(trump)
bysort time : egen santorumvote = mean(santorum)

sort time

* Drop duplicates

duplicates tag polldate, gen(dup)
duplicates drop polldate, force
drop dup

* Reshape the dataset

rename bush v_bush
rename carson v_carson
rename christie v_christie
rename cruz v_cruz
rename fiorina v_fiorina
rename gilmore v_gilmore
rename kasich v_kasich
rename huckabee v_huckabee
rename paul v_paul
rename rubio v_rubio
rename trump v_trump
rename santorum v_santorum


reshape long v_, i(polldate) j(party) string

rename v_ vote

keep polldate electiondate party vote poll time

order poll polldate electiondate time party vote

* Generate rank of parties

gsort polldate -vote

bysort polldate : gen rank = _n

gen first = party if rank == 1
gen second = party if rank == 2
gen third = party if rank == 3

bysort polldate : gen winner = first[1]
bysort polldate : gen runnerup = second[2]
bysort polldate : gen thirdplace = third[3]

* Generate poll margin

bysort polldate : gen pollmar = vote[1] - vote[2]

* Drop duplicates

duplicates tag polldate, gen(dup)
duplicates drop polldate, force
drop dup

* Misleading poll

gen misleading = 0 if winner == "trump" & pollmar >= 1
replace misleading = 1 if misleading!=0

* Save

drop if time == 0 | time == .

keep polldate pollmar misleading time

save "repprim2016_polls.dta", replace